Stochastic Gradient Descent
- Only update weights by choosing a specific instance of the batch instead of all of them.
- Very noisy
- but fast
- much faster than batch gradient for machine learning
- samples have redundancy between them

- Only reason for batching is because hardware is more efficient at batching
- Parallelized in a simple way, which is best solved by batching